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1. INTRODUCTION 

Stress is well known to modify the autonomic nervous system balance and to affect variety of 
physiological processes happening in the human body. Stress increases the risk of injuries and death due to 
loss of concentration or risky behaviours. Work-related stress is a significant factor for triggering low 
performance, depressions, and illnesses. Chronic stress exposure could impair the immune system, increase 
the risks of cardiovascular diseases, neurological disorders etc. In the educational context and the context of 
collaborative human-machine activities, work-related stress due to high cognitive load results in low 
performance and low efficiency on tasks that are otherwise easy to cope with. 

The emotional and mental response to high cognitive load varies widely among people. It was 
reported that the reduced examination results are often due to the increased stress level [1]. Stress exposure 
was reported to cause low physical activity, increased sweet consumption, sleep problems, and 
musculoskeletal pains [2]. Higher stress levels are associated directly with physiological changes and 
indirectly with poor health behaviours [3]. Unfortunately, the direct stress level assessment is challenging, 


Journal homepage: http://beei.org 


25400 ISSN: 2302-9285 


mainly due to the individuality of responses and the individual stress coping capacity. The physiological 
stress response is characterised by activating the hypothalamic-pituitary-adrenal (HPA) axis, which triggers 
various physiological processes to cope with a stressful situation and recover homeostasis [4]. In addition to 
cortisol release, stress also alters serum leptin (an anti-obesity hormone) level. D. J. Haleem et al. [4] it was 
shown that serum leptin levels are positively linked with academic performance and proposed serum leptin 
levels as a stress perception biomarker. Other studies demonstrated that physiological signals could be used 
as indicators of stress and health status. Significant progress in detecting of stress levels during driving has 
been made by J. A. Healey and R. W. Picard [5], utilizing electrocardiography (ECG), electromyography 
(EMG), and galvanic skin response (EDA) signals. J. Kim and E. Andre [6] developed an automated system, 
recognizing emotions in several classes. These and many other studies paved the way for the wide use of 
technology in health and well-being monitoring applications. Compared to subjective assessment based on 
questionnaires and intrusive medical methods, physiological approaches have better performance in terms of 
efficiency, non-intrusiveness, and diagnostic ability. 

The recent technological advances provided the means for the emergency of a great number of e- 
Health oriented services and applications [7]-[16]. Due to these advances, nowadays, application developers 
can quickly design new products and services that incorporate a range of novel functionalities, such as 
continuous health monitoring on-demand measurement of physiological states and conditions [17], data 
collection [18] and others. Building on robust machine learning methods and signal processing algorithms, 
which over the past decade were mastered as reliable instruments in laboratory conditions [19], these new 
functionalities have the potential to deliver significant social impact. The last is not only because these 
innovations aim to improve medical diagnostics and treatment practices but because they possess the 
potential to redefine the overall workflow in national healthcare systems. Although the recent advances in 
wearable devices allowed monitoring physiological stress indicators like heart rate, sweating rate, blood 
pressure, however at this time, there are no reliable direct methods for monitoring affective and behavioural 
stress response. Generally, the current stress-detection systems rely on physiological responses to stress or 
emotional stimuli used to train machine-learning models to predict a subject's affective state or emotions. 

In recent years, many researchers have made efforts to create databases and models for recognizing 
negative emotions, cognitive load and stress based on physiological signals. Some of the well-known are 
DEAP [20], MAHNOB-HCI [21], ASCERTAIN [22]. These are multimodal datasets that encompass 
recordings of physiological signals from healthy subjects, induced by purposely selected audio-visual stimuli 
or cognitive tasks. Specifically, the DEAP dataset contains recordings of the emotional reaction of 32 
participants. It provides over one hundred features extracted from electroencephalography (EEG), EDA, skin 
surface temperature, EMG, electrooculography (EOG), respiration, and photoplethysmography (PPG) 
signals. The MAHNOB_HCI comprises face and body video records, eye gaze and audio signals, EEG, 
EDA, ECG, respiration, and skin temperature recordings from 27 participants. The ASCERTAIN includes 
recordings from 58 participants, oriented towards evaluating the emotion-personality relationship and 
affecting recognition. Two hundred features were extracted from ECG, EDA, EEG, and EMO signals in 
order to emotional states modelling and the detection of five personality traits. The WESAD dataset is 
focused on wearable stress and affect detection. All physiological modalities are acquired during 
experimental setup inducing three affective states-neutral (neutral reading task), stress (trier social stress test) 
and amusement (funny video clips). 

In the present study, we aim to recognise acute stress caused by cognitive tasks. Based on previous 
research, we developed an automated detector for acute stress caused by a range of cognitive tasks. The novelty 
aspects, described in section 2, consist in the design and implementation of the signal pre-processing and the 
feature extraction stages, which were purposely crafted and fine-tuned for the specific needs of acute stress 
detection. For the purpose of detector validation, we experimented with three types of cognitive tasks 
characterised by different levels of abstraction, difficulty, and domain-specific knowledge in sections 3 and 4. 


2. RESEARCH METHOD 

The overall concept of the proposed automated detector of acute stress based on evidence extracted 
from PPG and EDA signals is shown in Figure 1. As shown, the workflow follows the conventional two- 
stage machine-learning strategy, including the compact information representation stage and classification 
stage. The information extraction process starts with physiological signal pre-processing, followed by peak 
detection and feature extraction steps. Next, the EDA and PPG-based features are subjected to post- 
processing which involves dynamic range normalisation and subset selection. The feature selection step 
reduces the feature vector size and eliminates the less relevant and redundant features. The feature vectors 
obtained in such a manner are then fed to a classifier trained to discriminate between acute stress and other 
emotional conditions. 
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Figure 1. The overall concept of the automated acute stress detector 


2.1. Peak detection 

In brief, the significant peaks in the PPG and EDA signals are identified via purposely-developed 
signal-specific peak detectors. Specifically, the PPG peak detection algorithm was inspired by the concept of 
the Mountaineers peak detection algorithm [23]; however, the actual processing steps and their 
implementation differ significantly from the original method. Our PPG-peak detection algorithm uses a 
signal follower based on the backward differences of amplitude to find rising edges and then apply several 
steps for refining the list of candidate peaks in Figure 2. The central idea behind this algorithm can be 
summarised as a signal follower, which seeks to detect rising edges, the endpoints of which becomes 
candidate peak locations. The list of candidate peaks is then refined to eliminate those not located within a 
pre-specified expected range. Any suspected not accurately positioned peaks are identified via procedure for 
detection of peak candidates, followed by two-step procedure for fine-tuning. This process brings numerous 
advantages in terms of noise robustness, accuracy of peak detection and computational efficiency. The 
proposed algorithm does not require detrending, or artefact removal, which makes it easy to be implemented 
on variety of mobile platforms. The algorithm is free of complex adjustments and fine-tuning. 





Figure 2. The overall concept of the systolic peak detection algorithm for PPG signals 


The EDA signal peak detector [24] aims to identify the SCR peaks using front slope detection. At 
the first processing step, we aim to separate the SCR and SCL components of the EDA signal (cf. Figure 3). 
Next, we search for the rising edges using a signal follower. If the distance between two rising edges is 
smaller than the threshold Tr (in samples), each peak candidate's amplitude is compared to a predefined 
amplitude threshold. The final processing step checks up for zero crossings before and after each peak 
candidate. 





Figure 3. The overall concept of the SCR peak detection algorithm for EDA signals 
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2.2. Feature extraction 

Let us consider the availability of a PPG signal with a duration of 50 sec. The raw PPG signal is 
down-sampled and then filtered with a median filter. Then the detection of R peaks is made automatically 
through the algorithm mentioned in section 2.1. The time intervals between two successive R peaks are 
referred to as RR intervals or inter-beat intervals (IBI). On the base of the obtained RR intervals, statistical 
features like mean and standard deviation in Table 1 are extracted. The variability of RR intervals is 
estimated, by the use of frequency domain analysis. After computing the power spectrum of three frequency 
bands: 1) very-low-frequency band [0, 0.04] Hz; 2) low-frequency band [0.04, 0.2] Hz; 3) high-frequency 
band [0.2, 0.4] Hz, the band-specific signal power is estimated. The ratio between the signal power in the low 
and high bands is used to estimate the heart rate variability (HRV). 

In Table 2, we show the EDA-based statistical features, as some of them (indexes 7-15) were 
extracted directly from the raw signal and others were derived based on the number of peaks found by the 
peak detector outlined in section 2.1. The power spectra features (index 16) were estimated using the fast 
fourier transform applied on a segment level. 


Table 1. The PPG signal features 
# Description PPG-based features 
1+12 Statistical Features mean heart rate; mean RR; mean, max and min NN interval; pNN50; SDNN; RMSSD; Standard 
deviation of the difference of successive NN intervals; SD1(Short term variability); SD2(Long 
term variability); SD1/SD2 ratio 
13+21 Frequency-domain Power in (0-0.04) Hz, (0.04-0.15) Hz, (0.15-0.4) Hz bands; normalised powers in the three bands; 
Features the power in the three bands in per cent; HRV (LF/HF ratio) 


Table 2. The EDA signal features 


# Description EDA-based Features 
1+15 Time-domain Number of Peaks, Mean, Max and Min Conductance, Conductance RMS, Standard 
Statistical Features Deviation of the conductance, Mean Resistance, First Quartile, Second Quartile, Third 


Quartile, Inter Quartile Range, Percentiles: 2.5, 10, 90, 97.5, 
16 Frequency-domain Features Power in the frequency band (0-2.4) Hz 


Afterwards, the raw EDA signal is low-pass filtered at 0.2 Hz to separate the tonic level of electrical 
conductivity (skin conductance level, (SCL)), reflecting variations of the arousal. Subtraction of the SCL 
component from the raw EDA signal leads to separation of the skin conductance response (SCR). The SCR is 
the phasic component resulting from the sympathetic nervous system's activity and refers to faster signal 
changes. Then the SCR peaks are detected by using the algorithm described in section 2.1. 


2.3. Post-processing and feature selection 

Dynamic range normalisation is applied to the so far obtained feature vectors. Based on the 
assumption of Gaussian distribution for all features, the dynamic range normalisation is implemented by 
subtracting the mean value and dividing by the standard deviation. Next, the EDA features are scaled to the 
dynamic range [0, 1] by dividing by the maximum value. To discard features that are not relevant to the acute 
stress detection task, we carried out feature selection before the classification stage. For that purpose, we 
used the adaptive feature selection method outlined in [25] to evaluate individual features' discriminative 
capability. A smaller feature vector brings benefits in terms of: 1) smaller dataset size is needed for robust 
model creation and 2) reducing the computational demand for model creation and classification. 


2.4. Classifier 

We trained binary detectors to discriminate between high and low levels of acute stress in person- 
specific and person-independent setups. All detectors used SVM with a polynomial kernel, which 
implements the Lı soft-margin classifier trained with the sequential minimal optimization (SMO) method. 
We followed the leave-one-out method and fine-tuned the classifier's adjustable parameters with a grid 
search in all experiments. The search ranges were set as follows: box constrain C e [10°°, 10°] with step 10°?, 
tolerance £ € {108, 107}, and kernel polynomial order p € {1, 2, 3}. 


3. EXPERIMENTAL PROTOCOL 

We performed an experimental evaluation of the acute stress detector in person-specific and person- 
independent setups, using the resources described in section 3.1. The performance evaluation was performed 
in terms of detection Accuracy (section 3.2). In the person-specific setup, we report the Average Accuracy for 
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all people. In all experiments, we used either the full feature vector consisting of the z-norm normalised 
features outlined in section 2.1 and 2.2 and a subset of features selected based on Fisher's discriminant ratio. 


3.1. Dataset and protocol 

In the current study, we used the CLAS dataset [26] due to its particular design-it allows to evaluate 
the individual’s ability to concentrate and successfully solve cognitive assignments under stress. The CLAS 
dataset contains a large number of recordings of the physiological response of university students in their 
twenties, acquired while they were performing three different cognitive tasks and two emotions-evoking 
tasks. Specifically, emotional responses were elicited through audio-visual and picture stimuli balanced in the 
four quadrants of the arousal-valence system. The two emotion elicitation tasks used 16 video clips and 16 
pictures with known tags, inducing different emotions with balanced distribution in the valence-arousal 
plane. The interactive cognitive tasks include (1) a Math test consisting of a sequence of 24 relatively simple 
mathematical problems, (11) a Stoop test consisting of 30 instances, and (111) an IQ test with 20 logical 
problems. The complexity, the duration of the stimuli, and the limited time for response in the cognitive 
tasks, followed by the quick show of the correct answer for each problem, were adjusted to build significant 
levels of acute stress. In contrast, the emotion elicitation tasks did not cause a high cognitive load and the 
associated acute stress because a response was not required-the participants were expected only to watch the 
stimuli. 

In the experimental validation, we used the whole blocks of PPG and EDA signals of 56 students 
(16 females and 40 males) who have complete sets of recordings. The acute stress models were built using 
the stacked blocks recorded during the Math test, Stroop test and IQ test. The reference model representing 
the absence of acute stress was created from the stacked blocks of the non-interactive tasks-these associated 
with emotion elicitation via music video clips and pictures set. We computed the PPG and EDA-based 
features outlined in Section 2.2 for signal segments with a duration of 120 sec that overlap with 60 seconds. 
This frame size was selected in order to provide an adequate frequency resolution for the spectral-domain 
features (cf. in Tables 1 and 2). 

In the person-specific setup, we computed the FLD-derived person-specific subsets of normalised 
EDA and PPG features using the methods outlined in section 2.3. For each person, we performed 
experiments with leave-one-out recording in order to better utilise the available dataset. Specifically, in the 
person-specific setup, we carried out four experiments with different settings of the feature normalisation and 
feature selection stage (cf. section 2.3), such as raw feature vector (Fa), normalised feature vector (Fromm), 
raw feature vector with adaptive FLD attribute selection (F,awFLD), normalised feature vector with adaptive 
FLD attribute selection (Frormrip). The raw feature vector consists of the genuine 21 PPG and 16 EDA 
features as computed (cf. section 2.2). The normalised feature vectors were obtained after applying the z- 
norm on the raw features. The adaptive FLD attribute selection followed and applied to the raw and 
normalised feature vectors. Here, we aimed to find the optimal performance settings, as previous related 
studies did not agree on the benefit of normalisation. 

In the person-independent setup, the dataset consisted of the merged feature vectors of all 56 
persons. We carried out the experimental evaluation using the leave-one-person-out shuffling of the available 
data. Specifically, we carried out experiments for the entire EDA-and PPG-based features with raw (Paw) 
and normalised (Fnorm) feature vectors using the entire feature vector, consisting of 37 features (cf. section 
2.2). We did not take advantage of the adaptive FLD attribute selection method because, during the person- 
specific study, we observed that the selected features varied significantly among people in both their number 
and composition. Thus, the EDA-and PPG-based features selected for different people were dissimilar, and 
there was no common subset suitable for most people. 

In both the person-specific and person-independent setups, we considered experiments using the 
data of the 40 Males, the 16 Females, and All 56 persons. This was aimed to investigate the potential gender- 
specific differences if there were any. 


3.2. Metrics 
The accuracy of the person-specific detectors for acute stress was computed in percentages as the 
weighted sum of the class-specific accuracy obtained for the two classes: 


TN , TP 
Ac ==( +2) «100, [%] (1) 


where TP is the number of true negative decisions, 1.e., the number of correctly detected instances of low- 
level acute stress, and 7P correctly detected instances with a high level of acute stress. N and P are the total 
number of low-level and high-level acute stress instances, respectively. For the person-independent detector, 
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we computed the Average Accuracy for all persons in the subset (Males=40; Females=16; All=56) in 
percentages: 


AvAc = 100 Ž Yi", Ac(i). [%] (2) 


4. RESULTS AND DISCUSSION 

In the following subsections, we report the experimental results for the person-specific (section 4.1) 
and the person-independent (section 4.2) setups separately, as these correspond to different application 
scenarios. 


4.1. Person-specific results 

In Table 3, we show the Average Accuracy of acute stress detection in percentages for the four 
different sets of features, such as the raw feature vector (Fay), normalised feature vector (From), raw feature 
vector with adaptive FLD attribute selection (F,awFLD), and the normalised feature vector with adaptive FLD 
attribute selection (FnormFLD). 


Table 3. Acute stress detection-Average Accuracy in percentages in the person-specific setup 


Average Feature vectors 
Accuracy [%| Fraa Eir F rawFLD F normFLD 
All 99.20% 99.20% 94.75% 99.72% 
Females 98.95% 98.95% 95.50% 99.72% 
Males 99.20% 99.20% 94.72% 99.72% 


As shown in Table 3, the highest Average Accuracy (99.72%) was observed for the normalised 
feature vectors with adaptive FLD attribute selection (Prommrip), and this holds for all the three subsets {AZJ, 
Females, Males}. This superb performance is due to the combined effect of the feature vector normalisation 
and the FLD-based selection of the person-specific subset of features. In the person-specific experimental 
setup, the z-norm permitted a beneficial selection of attributes with the FLD method. This normalisation 
expressed as eliminating the mean value of the features and scaling their dynamic range, facilitated the model 
creation and the actual detection processes. In contrast, the FLD-based attribute selection applied directly to 
the raw data led to the selection of features, which have significant DC offset (such as the HR and the EDA- 
based features), and this caused suboptimal acute stress detection accuracy for all subsets, 94.75%, 95.50% 
and 94.72% for All, Females, and Males, respectively. The slightly higher average accuracy for Females 
(95.50%) is not significantly different and is due to the smaller number of women (16), which causes a lower 
resolution of the accuracy assessment. Finally, the experimental results for the raw feature vector (Fray) and 
the normalised feature vector (From) without applying the adaptive FLD attribute selection are worse for all 
subsets. This is because these feature vectors contain parameters that are not relevant to the stress detection 
task or are not discriminative for the data of specific persons. 


4.2. Person-independent results 

In Table 4, we present the acute stress detection Accuracy in percentages for the two person- 
independent experiments-with raw (Fay) and normalised (Fromm) feature vectors. As shown in Table 4, a 
higher stress detection Accuracy was observed for the normalised feature vectors, Fnorm, when compared to 
the raw data F,aw, and this holds for all the three subsets {AJl/, Females, Males}. Again, the z-norm was found 
beneficial for acute stress detection. This benefit comes from eliminating the mean value of the different 
parameters and the unification of their dynamic range, which facilitates the modelling and classification 
stage. The detection Accuracy, computed only for the Males, is nearly identical to the one for All, which is 
understandable, keeping in mind that the All dataset is not gender-balanced-it contains 2.5 times more data of 
Males than of Females. 


Table 4. Acute stress detection Accuracy in percentages for the person-independent setup 


Feature vectors 
Accuracy [%] 


F raw Enorm 
All 98.00% 99.68% 
Males 98.00% 99.46% 
Females 86.44% 100% 
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It is even more interesting to note the effect of normalisation on the Females dataset, where the 
stress detection Accuracy for the raw data, Fay, was much lower when compared to those of All and Males 
and the normalised feature vectors, Fnorm, provided a higher detection Accuracy when compared to those of 
All and Males. This seemingly perplexing result is due partially to the relatively smaller size of that dataset -- 
there were only 16 Females. However, juxtaposing both results, we concluded that the observed detection 
Accuracy, 86.44% and 100% for the Fay and Frorm, respectively, is mainly due to the higher variability of the 
mean value computed for the Females features, which is effectively eliminated by the z-norm. The slightly 
higher detection Accuracy (difference of 0.54%), observed for Males compared to Females, is primarily due 
to the smaller size of the second dataset, which causes both lower quality of models and a worsened 
resolution of the Accuracy estimation. 

Finally, quite surprisingly, the person-specific acute stress detection results (Average 
Accuracy=99.72%) and the person-independent acute stress detection Accuracy (99.68%), computed for the 
dataset All, were not statistically different. The numerical similarity between the two stress detection results 
might be due to the relatively limited data in the person-specific setup, where the acute stress models may 
remain undertrained. This opens an interesting research direction for further studies. 


5. CONCLUSION 

We outlined the overall concept and the experimental validation of the proposed acute stress 
detection method based on EDA and PPG signals. The experimental results support that we can discriminate 
between low and high levels of acute stress caused by cognitive activities. The experimental evaluation in 
both person-specific and person-independent setups has validated the practical applicability of the proposed 
acute stress detection method in an experimental setup that approximates a personalised learning 
environment. Such functionality would facilitate the development of adaptive e-learning environments, 
which use continuous real-time monitoring of acute stress levels. Estimating the acute stress level would 
permit adaptability of the learning process intensity so that the system can manage the situations with high 
cognitive load levels leading to reduced perceptive capability. Furthermore, the availability of such 
adaptability would permit keeping the trainee in the zone of high concentration and high motivation for a 
more extended period, which would enhance the learning performance. 
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